2,327 research outputs found

    Visual Entailment: A Novel Task for Fine-Grained Image Understanding

    Get PDF
    Existing visual reasoning datasets such as Visual Question Answering (VQA), often suffer from biases conditioned on the question, image or answer distributions. The recently proposed CLEVR dataset addresses these limitations and requires fine-grained reasoning but the dataset is synthetic and consists of similar objects and sentence structures across the dataset. In this paper, we introduce a new inference task, Visual Entailment (VE) - consisting of image-sentence pairs whereby a premise is defined by an image, rather than a natural language sentence as in traditional Textual Entailment tasks. The goal of a trained VE model is to predict whether the image semantically entails the text. To realize this task, we build a dataset SNLI-VE based on the Stanford Natural Language Inference corpus and Flickr30k dataset. We evaluate various existing VQA baselines and build a model called Explainable Visual Entailment (EVE) system to address the VE task. EVE achieves up to 71% accuracy and outperforms several other state-of-the-art VQA based models. Finally, we demonstrate the explainability of EVE through cross-modal attention visualizations. The SNLI-VE dataset is publicly available at https://github.com/ necla-ml/SNLI-VE

    Visual Entailment Task for Visually-Grounded Language Learning

    Get PDF
    We introduce a new inference task - Visual Entailment (VE) - which differs from traditional Textual Entailment (TE) tasks whereby a premise is defined by an image, rather than a natural language sentence as in TE tasks. A novel dataset SNLI-VE (publicly available at https://github.com/necla-ml/SNLI-VE) is proposed for VE tasks based on the Stanford Natural Language Inference corpus and Flickr30k. We introduce a differentiable architecture called the Explainable Visual Entailment model (EVE) to tackle the VE problem. EVE and several other state-of-the-art visual question answering (VQA) based models are evaluated on the SNLI-VE dataset, facilitating grounded language understanding and providing insights on how modern VQA based models perform.Comment: 4 pages, accepted by Visually Grounded Interaction and Language (ViGIL) workshop in NeurIPS 201

    COMPOSER: Compositional Reasoning of Group Activity in Videos with Keypoint-Only Modality

    Full text link
    Group Activity Recognition detects the activity collectively performed by a group of actors, which requires compositional reasoning of actors and objects. We approach the task by modeling the video as tokens that represent the multi-scale semantic concepts in the video. We propose COMPOSER, a Multiscale Transformer based architecture that performs attention-based reasoning over tokens at each scale and learns group activity compositionally. In addition, prior works suffer from scene biases with privacy and ethical concerns. We only use the keypoint modality which reduces scene biases and prevents acquiring detailed visual data that may contain private or biased information of users. We improve the multiscale representations in COMPOSER by clustering the intermediate scale representations, while maintaining consistent cluster assignments between scales. Finally, we use techniques such as auxiliary prediction and data augmentations tailored to the keypoint signals to aid model training. We demonstrate the model's strength and interpretability on two widely-used datasets (Volleyball and Collective Activity). COMPOSER achieves up to +5.4% improvement with just the keypoint modality. Code is available at https://github.com/hongluzhou/composerComment: ECCV 202

    Identification and phylogenetic comparison of p53 in two distinct mussel species (Mytilus)

    Get PDF
    Author Posting. © The Authors, 2005. This is the author's version of the work. It is posted here by permission of Elsevier B. V. for personal use, not for redistribution. The definitive version was published in Comparative Biochemistry and Physiology Part C: Toxicology & Pharmacology 140 (2005): 237-250, doi:10.1016/j.cca.2005.02.011.The extent to which humans and wildlife are exposed to anthropogenic challenges is an important focus of environmental research. Potential use of p53 gene family marker(s) for aquatic environmental effects monitoring is the long-term goal of this research. The p53 gene is a tumor suppressor gene that is fundamental in cell cycle control and apoptosis. It is mutated or differentially expressed in about 50% of all human cancers and p53 family members are differentially expressed in leukemic clams. Here, we report the identification and characterization of the p53 gene in two species of Mytilus, Mytilus edulis and Mytilus trossulus, using RT-PCR with degenerate and specific primers to conserved regions of the gene. The Mytilus p53 proteins are 99.8% identical and closely related to clam (Mya) p53. In particular, the 3′ untranslated regions were examined to gain understanding of potential post-transcriptional regulatory pathways of p53 expression. We found nuclear and cytoplasmic polyadenylation elements, adenylate/uridylate-rich elements, and a K-box motif previously identified in other, unrelated genes. We also identified a new motif in the p53 3′UTR which is highly conserved across vertebrate and invertebrate species. Differences between the p53 genes of the two Mytilus species may be part of genetic determinants underlying variation in leukemia prevalence and/or development, but this requires further investigation. In conclusion, the conserved regions in these p53 paralogues may represent potential control points in gene expression. This information provides a critical first step in the evaluation of p53 expression as a potential marker for environmental assessment.AFM was supported by the Greater Vancouver Regional District, BC, Canada, and RLC was supported by STAR grant R82935901 from the Environmental Protection Agency (USA)

    Phosphoproteomics reveals that Parkinson’s disease kinase LRRK2 regulates a subset of Rab GTPases

    Get PDF
    Mutations in Park8, encoding for the multidomain Leucine-rich repeat kinase 2 (LRRK2) protein, comprise the predominant genetic cause of Parkinson's disease (PD). G2019S, the most common amino acid substitution activates the kinase two- to threefold. This has motivated the development of LRRK2 kinase inhibitors; however, poor consensus on physiological LRRK2 substrates has hampered clinical development of such therapeutics. We employ a combination of phosphoproteomics, genetics, and pharmacology to unambiguously identify a subset of Rab GTPases as key LRRK2 substrates. LRRK2 directly phosphorylates these both in vivo and in vitro on an evolutionary conserved residue in the switch II domain. Pathogenic LRRK2 variants mapping to different functional domains increase phosphorylation of Rabs and this strongly decreases their affinity to regulatory proteins including Rab GDP dissociation inhibitors (GDIs). Our findings uncover a key class of bona-fide LRRK2 substrates and a novel regulatory mechanism of Rabs that connects them to PD
    corecore